Decimal quantization by kdeldycke · Pull Request #494 · python-babel/babel

kdeldycke · 2017-04-20T16:13:46Z

This PR add a decimal_quantization parameter to format_decimal(), format_currency(), format_percent() and format_scientific(), to bypass the forced quantization on the fractional part. This PR definitely address #90 and is a continuation of #410.

This PR has been extensively reviewed and is waiting for #538 to be merged, so I can rebase that one on top of it.

codecov-io · 2017-05-18T06:45:43Z

Codecov Report

Merging #494 into master will increase coverage by 0.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #494      +/-   ##
==========================================
+ Coverage   90.04%   90.06%   +0.01%     
==========================================
  Files          24       24              
  Lines        4039     4046       +7     
==========================================
+ Hits         3637     3644       +7     
  Misses        402      402

Impacted Files	Coverage Δ
babel/numbers.py	`98.01% <100%> (+0.04%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cf03578...f5e0a3a. Read the comment docs.

kdeldycke · 2017-05-18T06:53:35Z

This PR was previously based on #491 and was partially duplicating it. Now that the later was merged, I rebased and squashed the whole code on top of the current master branch.

This PR is ready to be reviewed and merged.

benselme

Better docs needed
Improved commit message also needed

Not easy to review, but I think I have found some bugs using some data from ICU4J. The '#E0' pattern which is used in many locales for scientific notation should not truncate the fractional part.

Ex: for the 'es' locale: ICU gives "1,23445678E3", while babel gives "1E3".

I checked with other libraries, and the Elixir CLDR one gives the same result as ICU. I think it's a special case but I cannot find it in the specs.

There are other differences but all of them seem to stem from the fact that we ignore any tag with the 'draft' attribute.

benselme · 2017-05-22T22:23:25Z

babel/numbers.py

-def format_decimal(number, format=None, locale=LC_NUMERIC):
+def format_decimal(
+        number, format=None, locale=LC_NUMERIC,
+        decimal_quantization=True):


Doc for decimal_quantization is missing. What does it do, exactly ?

Done in d89365a.

benselme · 2017-05-22T22:24:25Z

babel/numbers.py

-                    currency_digits=True, format_type='standard'):
+def format_currency(
+        number, currency, format=None, locale=LC_NUMERIC, currency_digits=True,
+        format_type='standard', decimal_quantization=True):


Doc for decimal_quantization is missing here too.

Done in d89365a.

benselme · 2017-05-22T22:25:56Z

babel/numbers.py

-def format_percent(number, format=None, locale=LC_NUMERIC):
+
+def format_percent(
+        number, format=None, locale=LC_NUMERIC, decimal_quantization=True):


Same as above, undocumented parameter of a public API.

Done in d89365a.

benselme · 2017-05-22T22:26:55Z

babel/numbers.py


-def format_scientific(number, format=None, locale=LC_NUMERIC):
+def format_scientific(
+        number, format=None, locale=LC_NUMERIC, decimal_quantization=True):


Doc for param missing, same as above

Done in d89365a.

benselme · 2017-05-22T22:36:33Z

babel/numbers.py

+        detected in the pattern. Default is to not mess with the scale at all.
+        """
+        scale = 0
+        if '%' in ''.join(self.prefix + self.suffix):


The call to join is useless here. A simple concatenation would be enough.

Done in 7e15be9.

The call to join is indeed required as prefix and suffix are lists of strings. Removing the join introduce a regression. See: https://travis-ci.org/python-babel/babel/jobs/240689769#L1007

I reverted the change in 4887fdf. All unittests passes now.

benselme · 2017-05-22T22:36:47Z

babel/numbers.py

+        scale = 0
+        if '%' in ''.join(self.prefix + self.suffix):
+            scale = 2
+        elif u'‰' in ''.join(self.prefix + self.suffix):


Useless join also.

Done in 7e15be9.

The call to join is indeed required as prefix and suffix are lists of strings. Removing the join introduce a regression. See: https://travis-ci.org/python-babel/babel/jobs/240689769#L1007

I reverted the change in 4887fdf. All unittests passes now.

Ok, sorry about this one !

No worries! :)

kdeldycke · 2017-05-29T00:20:46Z

Thanks @benselme for the reviewing effort! I'll address all your comments in one or two weeks as I'm on holidays right now! 😎

kdeldycke · 2017-06-08T08:00:06Z

I just addressed all coding and document issues. The PR has been squashed into one commit.

@benselme : I know this PR is not easy to review. To ease the pain I took the time to write expansive tests, searching for edge-cases and undefined situations.

I'm really happy with the results as I found a way to handle generically any rendering pattern.

Now the only remaining issue concerns the #E0 situation, with breaks the consistency of the pattern rendering. What should I do here to have this PR accepted?

benselme · 2017-06-11T19:11:57Z

Thanks a lot for your work.

What's here looks good to me. But I think the #E0 issue needs to be solved before a merge can take place, since it makes scientific formatting useless for dozens of languages. I would write a special case for it, with proper comments and test cases. I think that's what the cldr elixir package does, but I very well might be wrong. I had a quick look in ICU4J but could not locate the relevant code.

I would also love to have @etanol 's opinion on all this since he's the other person who recently did some work on numbers.

PS: the CLDR-users mailing list is a very useful resource.

etanol · 2017-06-12T08:28:49Z

Sorry to respond so late, but I haven't been involved in Python development for a while.

The changes look good, although they are a bit difficult to review because refactoring and new functionality are not in separate commits. If you have the time, please consider splitting this PR in to two commits, otherwise it's very confusing to tell apart code movements from actual new functionality or bugfixing.

I also have a couple of other comments for specific snippets in case you want to address them.

Overall, I'm okay with the feature as long as it doesn't break backwards compatibility.

etanol · 2017-06-12T08:33:58Z

babel/numbers.py

-    def apply(self, value, locale, currency=None, force_frac=None):
-        frac_prec = force_frac or self.frac_prec
+    @property
+    def scale(self):


Turning a precomputed field into a property is going to have a small performance impact. My first motivation to contribute to this module was a use case we had, where currency formatting was consuming a lot of CPU time. This is not Java, so we don't have a JIT that can inline this method call.

If you have time, try to do some trivial benchmarking by comparing against the parent commit. Think about formatting hundreds of thousands of values.

This could use the @cached_property decorator.

I do not want to add other module dependencies so I abandoned the idea of reusing a @cached_property decorator.

Instead I simply kept the original method to keep the code readable and documented. See how the scaling value is now computed once since commit 7de4329.

etanol · 2017-06-12T08:46:28Z

babel/numbers.py

+        Forced decimal quantization is active by default so we'll produce a
+        number string that is strictly following CLDR pattern definitions.
+        """
+        # Manipulates decimal instances only.


Most of the one-line comments in this method are not very useful. I understand and appreciate the aesthetics, but in practice they don't provide any value. I think it's better to add small paragraphs near the complex chunks of code, explaining the abstract ideas that lead to them.

I stand by the principle that you can't have too much documentation, but I don't really want to nit-pick here. Also note that this commenting style was originally there and I simply tried to keep the spirit of the current code base.

Anyway I made the necessary changes to remove one-line comments in commit 22b8f81.

etanol · 2017-06-12T08:49:31Z

babel/numbers.py

+        if not decimal_quantization:
+            # Bump decimal precision to the natural precision of the number if
+            # it exceeds the one we're about to use.
+            frac_prec = (frac_prec[0], max([frac_prec[1], self.precision(value)]))


So this is the core of the PR, right? As I mentioned in the discussion, it would be much better to separate this (and all dependant changes) to a separate commit. This will allow reviewers to choose granularity (since the GitHub UI allows to review all commits combined), and will ease potential bisections in the future.

It is the core of the PR which is about bypassing decimal quantization. And I'll be happy to split it into another PR.

Because the code is currently highly intertwined, I'd rather complete the review process of the whole monolithic PR in its present form before attempting a code split.

I think we are ready to see that split now. And I'm going to insist more on separating refactoring from the real decimal quantization optional behavior code. It will help potential future bisections.

Thanks @etanol and sorry for the delay.

I just spliced it up to #494.

akx · 2017-06-12T10:12:31Z

babel/numbers.py

-    def apply(self, value, locale, currency=None, force_frac=None):
-        frac_prec = force_frac or self.frac_prec
+    @property
+    def scale(self):


This could use the @cached_property decorator.

akx · 2017-06-12T10:14:03Z

babel/numbers.py

+        return scale
+
+    @staticmethod
+    def precision(number):


I think method names should contain a verb – perhaps this should be get_precision()?

Also, instead of a static method, consider making it a free function?

I renamed the method and made it a free function in commit 8c70916.

akx · 2017-06-12T10:15:10Z

babel/numbers.py

+        return abs(decimal_tuple.exponent)
+
+    @staticmethod
+    def quantum(precision):


The comments for the precision method apply here too :)

Done in commit dba70a5.

akx · 2017-06-12T10:17:10Z

tests/test_numbers.py

 from datetime import date

-from babel import numbers
+from babel.numbers import (


Isn't this a repeat of line 23, below?

Good catch! Removed in commit 7a8c1f2.

akx · 2017-06-12T10:19:34Z

tests/test_numbers.py

+
+def test_format_decimal_quantization():
+    # Test precision conservation.
+    test_data = [


You can use @pytest.mark.parametrize here:

@pytest.mark.parametrize('input_value, expected', [ ('10000', '10,000'), # (etc, etc.) ]) def test_format_decimal_quantization(input_value, expected): # ...

This has the advantage that each of these cases will show up (and be addressable) as a separate test case, so if one of those fails, the reader will not have to read the locals to figure out which particular case failed. :)

Done in commit b882589.

akx · 2017-06-12T10:19:56Z

tests/test_numbers.py

            == u'1.099,98')


+def test_format_currency_quantization():


Same parametrize comment applies here.

Done in commit b882589.

akx · 2017-06-12T10:20:04Z

tests/test_numbers.py


-def test_scientific_exponent_displayed_as_integer():
-    assert numbers.format_scientific(100000, locale='en_US') == u'1E5'
+def test_format_percent_quantization():


And parametrize this too :)

Done in commit b882589.

akx · 2017-06-12T10:20:13Z

tests/test_numbers.py

+    assert numbers.format_scientific(42, u'00000.000000E0000', locale='en_US') == u'42000.000000E-0003'
+
+
+def test_format_scientific_quantization():


And this oughta get parametrized too :D

Done in commit b882589.

akx · 2017-06-12T10:22:44Z

babel/numbers.py

+        value = abs(value).normalize()
+
+        # Prepare scientific notation metadata.
+        if self.exp_prec:


There's big chunks of scientific notation code interspersed with more "regular" rendering here – would you think the exp_prec bits could maybe be refactored into separate functions?

Done in commit e66a4c1.

akx · 2017-06-12T10:23:43Z

tests/test_numbers.py

        # Exponent grouping
        fmt = numbers.format_scientific(12345, '##0.####E0', locale='en_US')
-        self.assertEqual(fmt, '12.345E3')
+        self.assertEqual(fmt, '1.2345E4')


So was this a bug, or a deliberate change in output?

That is a deliberate change in output to fix what I think is a bug. The ##0.####E0 pattern implies only the first digit before the decimal separator is required.

kdeldycke · 2017-06-29T15:28:15Z

@benselme: I just addressed the issue of the default #E0 rendering pattern in commit b275690. The current PR is now rendering scientific notation like the CLDR elixir package and ICU4J does.

kdeldycke · 2017-06-29T15:32:16Z

I just finished addressing all pending comments of this PR.

What needs to be done now:

Have @etanol, @akx and @benselme do a final pass on the PR and eventually lift their review process.
Done: Squash all commits into one.
Split the PR into two: one for decimal quantization, and one for all the other refactoring and bug fixes.

benselme · 2017-07-03T17:52:26Z

babel/numbers.py

+        # triggered if the decimal quantization is disabled or if a scientific
+        # notation pattern has a missing mandatory fractional part (as in the
+        # default '#E0' pattern). This special case has been extensively
+        # disccused at https://github.com/python-babel/babel/pull/494#issuecomment-307649969 .


Small typo here discussed not disccused

Good catch! Fixed in 6d0eabc.

benselme · 2017-07-03T17:54:43Z

Thanks ! I'll try and have a look at your changes this week.

benselme · 2017-07-16T17:02:11Z

Sorry, weeks go by and I've been quite busy. Not forgetting about this PR though. Thanks for your patience.

kdeldycke · 2017-07-18T07:46:49Z

Thanks @benselme for the feedback! I know this PR is kind of tough so take your time.

kdeldycke · 2017-08-18T12:36:51Z

Any news on that one? :)

kdeldycke · 2017-10-16T09:39:47Z

I just finished splitting this PR. #538 is now waiting to be merged so I can finally rebase that one to add proper decimal quantization control.

kdeldycke · 2017-10-18T10:09:35Z

This PR has been rebased on top of the latest master branch and is ready to be merged! :)

etanol · 2017-10-23T19:55:30Z

I think three years is enough proof of patience and perseverance. Thanks @kdeldycke for having both.

kdeldycke · 2017-10-24T07:27:55Z

Thanks @etanol for the merge! 😃

gitmate-bot added process/pending review pending review size/XL and removed process/pending review labels Apr 20, 2017

kdeldycke mentioned this pull request Apr 24, 2017

Bypass number rounding #90

Closed

gitmate-bot added size/XXL and removed size/XL labels May 18, 2017

gitmate-bot added size/XL and removed size/XXL labels May 18, 2017

kdeldycke mentioned this pull request May 21, 2017

Currency symbol parsing code should ignore alternate symbols #397

Closed

benselme requested changes May 27, 2017

View reviewed changes

etanol requested changes Jun 12, 2017

View reviewed changes

akx requested changes Jun 12, 2017

View reviewed changes

benselme reviewed Jul 3, 2017

View reviewed changes

kdeldycke mentioned this pull request Oct 16, 2017

Decimal refactor #538

Merged

Allow bypass of decimal quantization.

f5e0a3a

kdeldycke mentioned this pull request Oct 18, 2017

Currency normalization #478

Open

etanol approved these changes Oct 23, 2017

View reviewed changes

etanol merged commit 1539c8a into python-babel:master Oct 23, 2017

kdeldycke deleted the decimal-quantization branch October 24, 2017 07:27

akx mentioned this pull request Jan 16, 2018

Add information about precision on format_percent #460

Closed

akx mentioned this pull request May 7, 2018

Restore force_frac to NumberPattern.apply() (as deprecated) #577

Merged

akx mentioned this pull request Jun 27, 2023

Allow precision specification. #1010

Closed

		assert numbers.format_scientific(42, u'00000.000000E0000', locale='en_US') == u'42000.000000E-0003'


		def test_format_scientific_quantization():

Comments

Conversation

kdeldycke commented Apr 20, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-io commented May 18, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

kdeldycke commented May 18, 2017

Uh oh!

benselme left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kdeldycke commented May 29, 2017

Uh oh!

kdeldycke commented Jun 8, 2017

Uh oh!

benselme commented Jun 11, 2017

Uh oh!

etanol commented Jun 12, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kdeldycke commented Apr 20, 2017 •

edited

Loading

codecov-io commented May 18, 2017 •

edited

Loading